Goto

Collaborating Authors

 conditional value




Learning Robust Options by Conditional Value at Risk Optimization

Neural Information Processing Systems

Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options. We conduct experiments to evaluate our method in multi-joint robot control tasks (HopperIceBlock, Half-Cheetah, and Walker2D). Experimental results show that our method produces options that 1) give better worst-case performance than the options learned only to minimize the average-case loss, and 2) give better average-case performance than the options learned only to minimize the worst-case loss.


PAC-Bayesian Bound for the Conditional Value at Risk

Neural Information Processing Systems

Conditional Value at Risk ($\textsc{CVaR}$) is a ``coherent risk measure'' which generalizes expectation (reduced to a boundary parameter setting). Widely used in mathematical finance, it is garnering increasing interest in machine learning as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the $\textsc{CVaR}$ of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical $\textsc{CVaR}$ is small. We achieve this by reducing the problem of estimating $\textsc{CVaR}$ to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for $\textsc{CVaR}$ even when the random variable in question is unbounded.


PAC-Bayesian Bound for the Conditional Value at Risk

Neural Information Processing Systems

The goal in statistical learning is to learn hypotheses that generalize well, which is typically formalized by seeking to minimize the expected risk associated with a given loss function.



Review for NeurIPS paper: PAC-Bayesian Bound for the Conditional Value at Risk

Neural Information Processing Systems

Conditional Value at Risk or Expected Shortfall (CVaR) of a random variable is the expected value of this random variable conditionally on the fact that this random variable exceeds a given value. As example, it quantifies the amount of tail risk an investment portfolio has. This kind of value is of importance in many situations and is getting more attention in the ML community. Indeed, a learned predictor that has a bad accuracy might nevertheless be of high utility if it get some a high CVaR provided there is a particular interest for examples that are in the best quantile (e.g., best drivers for a car insurance compagnies ... the only ones that should qualify for a reduction of their insurance quotes. There is still a lot to understand on CVaR from the learning theory point of view, this paper proposes the first known PAC-Bayesian bound for CVaR.


PAC-Bayesian Bound for the Conditional Value at Risk

Neural Information Processing Systems

Conditional Value at Risk ( \textsc{CVaR}) is a coherent risk measure'' which generalizes expectation (reduced to a boundary parameter setting). Widely used in mathematical finance, it is garnering increasing interest in machine learning as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the \textsc{CVaR} of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical \textsc{CVaR} is small. We achieve this by reducing the problem of estimating \textsc{CVaR} to that of merely estimating an expectation.


Learning Robust Options by Conditional Value at Risk Optimization

Neural Information Processing Systems

Options are generally learned by using an inaccurate environment model (or simulator), which contains uncertain model parameters. While there are several methods to learn options that are robust against the uncertainty of model parameters, these methods only consider either the worst case or the average (ordinary) case for learning options. This limited consideration of the cases often produces options that do not work well in the unconsidered case. In this paper, we propose a conditional value at risk (CVaR)-based method to learn options that work well in both the average and worst cases. We extend the CVaR-based policy gradient method proposed by Chow and Ghavamzadeh (2014) to deal with robust Markov decision processes and then apply the extended method to learning robust options.


Minimax Optimal Algorithms for Unconstrained Linear Optimization H. Brendan McMahan Jacob Abernethy

Neural Information Processing Systems

We design and analyze minimax-optimal algorithms for online linear optimization games where the player's choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark functions. The problem is cast as a sequential multi-stage zero-sum game, and we give a thorough analysis of the minimax behavior of the game, providing characterizations for the value of the game, as well as both the player's and the adversary's optimal strategy. We show how these objects can be computed efficiently under certain circumstances, and by selecting an appropriate benchmark, we construct a novel hedging strategy for an unconstrained betting game.